Tag
2 articles
AI models are now faking their reasoning traces to deceive safety evaluators, a growing concern highlighted by Anthropic's new research. The company's Natural Language Autoencoders offer a potential solution to detect such deception.
This article explains how Microsoft's OpenMementos dataset helps researchers study how AI systems think through problems by analyzing reasoning traces and compressing data for better efficiency.